70 research outputs found

    Load Balancing Regular Meshes on SMPS with MPI

    Get PDF
    Domain decomposition for regular meshes on parallel computers has traditionally been performed by attempting to exactly partition the work among the available processors (now cores). However, these strategies often do not consider the inherent system noise which can hinder MPI application scalability to emerging peta-scale machines with 10000+ nodes. In this work, we suggest a solution that uses a tunable hybrid static/dynamic scheduling strategy that can be incorporated into current MPI implementations of mesh codes. By applying this strategy to a 3D jacobi algorithm, we achieve performance gains of at least 16% for 64 SMP nodes

    Hybrid static/dynamic scheduling for already optimized dense matrix factorization

    Get PDF
    We present the use of a hybrid static/dynamic scheduling strategy of the task dependency graph for direct methods used in dense numerical linear algebra. This strategy provides a balance of data locality, load balance, and low dequeue overhead. We show that the usage of this scheduling in communication avoiding dense factorization leads to significant performance gains. On a 48 core AMD Opteron NUMA machine, our experiments show that we can achieve up to 64% improvement over a version of CALU that uses fully dynamic scheduling, and up to 30% improvement over the version of CALU that uses fully static scheduling. On a 16-core Intel Xeon machine, our hybrid static/dynamic scheduling approach is up to 8% faster than the version of CALU that uses a fully static scheduling or fully dynamic scheduling. Our algorithm leads to speedups over the corresponding routines for computing LU factorization in well known libraries. On the 48 core AMD NUMA machine, our best implementation is up to 110% faster than MKL, while on the 16 core Intel Xeon machine, it is up to 82% faster than MKL. Our approach also shows significant speedups compared with PLASMA on both of these systems

    Low-overhead scheduling for improving performance of scientific applications

    Get PDF
    Application performance can degrade significantly due to node-local load imbalances during application execution on a large number of SMP nodes. These imbalances can arise from the machine, operating system, or the application itself. Although dynamic load balancing within a node can mitigate imbalances, such load balancing is challenging because of its impact to data movement and synchronization overhead. We developed a series of scheduling strategies that mitigate imbalances without incurring high overhead. Our strategies provide performance gains for various HPC codes, and perform better than widely known scheduling strategies such as OpenMP guided scheduling. Our developed scheme and methodology allows for scaling applications to next-generation clusters of SMPs with minimal application programmer intervention. We expect these techniques to be increasingly useful for future machines approaching exascale

    Limb reconstruction system as a primary and definitive mode of fixation in open fractures of long bones

    Get PDF
    Background: Management of open fractures of long bones by the traditional systems is very complex. Limb reconstruction system (LRS) was considered as very effective, and offers rigid stabilization of fracture fragments and with an easy access to soft tissue care. The aim of the study was to determine the efficacy of LRS for treatment of open fractures of long bones.Methods: This prospective study included 30 cases of both the sexes aged between 11-60 years. Patients with closed fractures of long bones and fractures treated conservatively were excluded from the study. Their clinical and radiological evaluation will be done at presentation and certain specific intervals and evaluated for signs of bone union and associated complications.Results: The mean age of the patients participated in the study was 35.6 years with male predominance (93.3%). All patients (100%) were injured by road traffic accidents. 50% of the cases were of Grade 2 type of fractures. The most common complication encountered was pin tract infections seen in 8 cases. We had good results in 24 patients, moderate in 5 and poor in 1 patient using modified Anderson and Hutchinson’s criteria. Conclusions: LRS is an alternative to the traditional system of fixation in the primary management of open fractures of long bones. It is less cumbersome to the patient and more patient friendly in terms of reducing financial burden also. It is a definitive single stage procedure.

    MPI + MPI: a new hybrid approach to parallel programming with MPI plus shared memory

    Get PDF
    Hybrid parallel programming with the message passing interface (MPI) for internode communication in conjunction with a shared-memory programming model to manage intranode parallelism has become a dominant approach to scalable parallel programming. While this model provides a great deal of flexibility and performance potential, it saddles programmers with the complexity of utilizing two parallel programming systems in the same application. We introduce an MPI-integrated shared-memory programming model that is incorporated into MPI through a small extension to the one-sided communication interface. We discuss the integration of this interface with the MPI 3.0 one-sided semantics and describe solutions for providing portable and efficient data sharing, atomic operations, and memory consistency. We describe an implementation of the new interface in the MPICH2 and Open MPI implementations and demonstrate an average performance improvement of 40% to the communication component of a five-point stencil solve

    Development and evaluation of introgression lines with yield enhancing genes of the Indian mega-variety of rice, MTU1010

    Get PDF
    MTU 1010 is an early maturing and high-yielding mega rice variety widely grown in an area of 3 Mha. It is characterised by limited grain number and panicle branching. To improve the grain number in MTU 1010, an IRRI breeding line, IR121055-2-10-5 was utilized as donor to transfer yield-enhancing genes Gn1a and OsSPL14 (associated with increased grain number and better panicle branching, respectively) into MTU1010 by Marker-Assisted Backcross Breeding (MABB). At each backcross generation, foreground selection was carried out with Gn1a and OsSPL14- specific molecular markers, whilst background selection was done with a set of SSR markers polymorphic between the IR121055-2-10-5 and MTU1010. With the use of a gene-specific marker, homozygous BC2 F2 plants carrying the yield-enhancing gene were identified and advanced through pedigree-method of selection till BC2 F6 and best performing ten lines were selected and evaluated in replicated station trials for yield contributing traits, where grain number and brancing per panicle exhibited high significant and positive correlation with single plant yield. Three promising lines namely RP6353-5-8-13-24, RP6353-26-13-39-5 and RP6353-32-12-8-16 with higher grain number and yield than MTU1010 were identified and nominated for evaluation in Initial Varietal Trial-Aerobic (IVT-Aerobic) of All India Crop Improvement Programme on Rice (AICRP), of which RP6353-26-13-39-5 (IET28674), was promoted for further testing
    corecore